Studying linguistic changes on 200 years of newspapers

نویسندگان

  • Vincent Buntinx
  • Cyril Bornet
  • Frédéric Kaplan
چکیده

This research investigates methods to study linguistic evolution using a corpus of scanned newspapers. We use a corpus of 4 million press articles covering about 200 years of archives, thus documenting indirectly the evolution of written language. The corpus is made out of digitized facsimiles of Le Journal de Genève (1826-1997) and La Gazette de Lausanne (1804-1997). For each journal, the daily scanned issues were algorithmically transcribed using an OCR system. The whole archive represents more than 20 TB of scanned data and contains about two billion words, putting it beyond the capabilities of most usual analysis techniques for regular desktop computers.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Studying Linguistic Changes over 200 Years of Newspapers through Resilient Words Analysis

This paper presents a methodology to analyze linguistic changes in a given textual corpus allowing to overcome two common problems related to corpus linguistics studies. One of these issues is the monotonic increase of the corpus size with time, and the other one is the presence of noise in the textual data. In addition, our method allows to better target the linguistic evolution of the corpus,...

متن کامل

A Contrastive Study of Stance-Markers in Opinion Columns of English vs. Farsi Newspapers

This contrastive study was conducted to contrastively analyze English and Farsi newspaper opinion columns in terms of the frequency of different types of stance markers.60 newspaper opinion columns (30 written in English and 30 written in Farsi) from 10 wide spread newspapers published in the United States and Iran in 2015 were analyzed. Hyland’s (2005) model of stance markers (hedges, boosters...

متن کامل

A Contrastive Analysis of Sports Headlines in Two English Newspapers

It holds true that a flourishing fieldof Contrastive Rhetoric (CR) research has begun to address theway various text types and/or genres may differ across culturesand languages (Corner, 1996).  Very much in line withthis development, this study was an attempt to characterizethe linguistic structures of headlines in the sports section of 2 English newspapers: one non-Iranian (The Times) and one ...

متن کامل

Italian Political Communication and Gender Bias: Press Representations of Men/Women Presidents of the Houses of Parliament (1979, 1994, and 2013)

The study considers mass media communication as intertwined with social norms, as assumed by the perspective of social representations. It explores the Italian press communication by focusing on three pairs of men and women politicians with different political orientations and all serving as presidents of the Houses of Parliament in three legislatures. The article concentrates on five newspaper...

متن کامل

Linguistic Audit as a Professional Activity

The subject of this research is linguistic (or: language) audit. The term is new and not being widely used so far. Linguistic audit, in particular, is offered as a service of linguistic-consulting agencies’ activities. Modern linguistic consulting, according to the author, is a form of stimulating theoretical and practical development of linguistic ecology, a new branch of applied linguistics, ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016